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Abstract — Consider a decision malier who is responsible to 
collect observations so as to enhance his information in a speedy 
manner about an underlying phenomena of interest. The policies 
under which the decision maker selects sensing actions can be 
categorized based on the following two factors: i) sequential 
vs. non-sequential; ii) adaptive vs. non-adaptive. Non-sequential 
policies collect a fixed number of observation samples and make 
the final decision afterwards; while under sequential policies, 
the sample size is not known initially and is determined by the 
observation outcomes. Under adaptive policies, the decision maker 
relies on the previous collected samples to select the next sensing 
action; while under non-adaptive policies, the actions are selected 
independent of the past observation outcomes. 

In this paper, performance bounds are provided for the policies 
in each category. Using these bounds, sequentiality gain and 
adaptivity gain, i.e., the gains of sequential and adaptive selection 
of actions are characterized. 

Index Terms — Active hypothesis testing, performance bounds, 
feedback gain, error exponent. 



I. Introduction 

This paper considers a generalization of tlie classical hy- 
pothesis testing problem. Suppose there are AI hypotheses 
among which only one is true. A Bayesian decision maker 
is responsible to enhance his information about the correct 
hypothesis in a speedy manner with a small number of samples 
while accounting for the penalty of wrong declaration. In 
contrast to the classical A/-ary hypothesis testing problem, 
at any given time, our decision maker can choose one of 
K available actions and hence, exert some control over the 
collected sample's "information content." We refer to this 
generalization, originally tackled by Chernoff 0], as the active 
hypothesis testing problem. The special cases of active hypoth- 
esis testing naturally arise in a broad spectrum of applications 
in cognition 111, communications ID, anomaly detection H, 
image inspection generalized search ||6l, group testing Q, 
and sensor management fS). 

The sample size and the sensing actions can be selected 
either based on the past observation outcomes (on-line) or 
independent from them (off-line or open loop). According to 
this fact, the solutions are divided into four categories based 
on the following two factors: i) sequential vs. non-sequential; 
ii) adaptive vs. non-adaptive. Non-sequential schemes collect 
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a fixed number of observation samples and make the final 
decision afterwards; while under sequential ones, the sample 
size is not set in advance and instead is determined by the spe- 
cific observations made. Under adaptive policies, the decision 
maker relies on the previous collected samples to select the next 
sensing action; while under non-adaptive policies, the actions 
are selected independent of the past observation outcomes. 
A question of both theoretical and practical significance is 
the characterization of the benefits of making sequential and 
adaptive decisions relative to the non-sequential and non- 
adaptive solutions. 

Due to the importance of the question, such gains have been 
characterized for many special cases of the active hypothesis 
testing Is), IS), HO). For instance, in H and HOl, simple 
sequential and adaptive high dimensional reconstruction and 
sparse recovery are shown to significantly outperform the 
performance of the best non-sequential non-adaptive solutions. 
In contrast, ||5l identifies scenarios where the gain in practice is 
insignificant. In this paper, we consider the problem of active 
hypothesis testing in its full generality and provide upper and 
lower bounds on the expected cost of the optimal sensing 
selection strategies in sequential and non-sequential as well 
as adaptive and non-adaptive classes of policies. Furthermore, 
the bounds are shown to be asymptotically tight (in terms of 
number of samples or equivalently in terms of reliability) and 
logarithmically increasing in the penalty of wrong declaration 
(or equivalently the error probability). 

As simple corollaries, we provide a full characterization of 
the sequentiality and adaptivity gains in the general active 
hypothesis testing framework. These findings generalize and 
extend those of ID and ifTol by showing a logarithmic sequen- 
tiality gain in all cases and an additional logarithmic adaptivity 
gain in a large class of practically relevant cases. Furthermore, 
the results prove, as a corollary, the conjecture given in ||5l 
on the insignificance of adaptivity gain when there exists a 
"most informative" sensing action which is independent of the 
Bayesian prior. Finally, we specialize our results in the active 
binary hypothesis testing case and state a simple necessary and 
sufficient condition for a logarithmic adaptivity gain. 

This work and analysis is closely related and complimentary 
to a growing body of hterature on hypothesis testing IH, ifTTI - 
|[T9l. We discuss the specific contributions and connections in 
Subsection III-DI 

The remainder of this paper is organized as follows. In 
Section nil we formulate the problem and define various types 
of policies for selecting actions. Sections |lll] and |IV] provide 
the main results of the paper and discusses the advantage of 
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sequential and adaptive selection of actions. In Section |V] 
active binary hypothesis testing is investigated as a special 
case and a necessary and sufficient condition for a logarithmic 
adaptivity gain is provided. Finally, we conclude the paper and 
discuss future work in Section IVII 

Notations : A random variable is denoted by an upper case 
letter (e.g. X) and its realization is denoted by a lower case 
letter (e.g. x). For any set S, \S\ denotes the cardinality 
of S. For a set A, let A(^) denote the collection of all 
probability distributions on elements of A, i.e., A(^) = 
{A e [0, : Eae^i^a = !}■ The Kullback-Leibler (KL) 
divergence between two probability density functions q{-) and 
q'(-) on space Z is defined as D(q\\q') = q{z) log -^r^dz, 
with the convention log f = and b log ^ ~ oo for 
a,b G [0,1] with b 0. The Renyi divergence of order 
a, a £ [0, 1], between two probability density functions 
g(-) and g'(-) on space Z is denoted by Da{q\\q') where 
DMW) = i^log/2 9"Wg''""(^)d^ for a S [0,1) and 
Da{q\\q') = D{q\\q') for a = 1. Finally, let N{m,a'^) denote 
a normal distribution with mean and variance <t^. 

II. Problem Setup 

In Subsection III-AI we formulate the problem of active 
hypothesis testing. Subsection III-BI discusses different types 
of policies for selecting actions. Subsection III-CI explains why 
active hypothesis testing is a partially observable Markov 
decision problem (POMDP) and provides the sufficient statistic 
for this problem. Finally, in Subsection III-DI we state the main 
contributions of the paper and provide a summary of related 
works. 

A. Problem Formulation 

Here, we provide a precise formulation for the active A/-ary 
hypothesis testing problem. 

Let = {1, 2, . . . , A/}. Let iJ^, i G ri, denote M hypothe- 
ses of interest among which only one holds true. Let 9 be the 
random variable that takes the value 9 ^ i on the event that Hi 
is true for i E il. We consider a Bayesian scenario with a given 
prior (behef) about 9, i.e., initially P{{9 = i}) = pi{0) > 
for all i € fl. Ais the set of all sensing actions and is assumed 
to be finite with \A\ = K < oo. Z is the observation space. 
For all a E A, the observation kernel qf{-) (on Z) is the 
probability density function for observation Z when action a 
has been taken and Hi is true. We assume that observation 
kernels {qf {■)}%. a are known. Let L denote the penalty for a 
wrong declaration, i.e., the penalty of selecting Hj, j ^ i, when 
Hi is true. Let r be the (stopping) time at which the decision 
maker retires. The objective is to find a stopping time r, a 
sequence of sensing actions A(0), A(l), . . . , A(t — 1), and a 
declaration rule d : A'^ x Z'^ — > ft that collectively minimize 
the expected total cost 

E[r + Ll{rf(^.,z-)^e}] , (1) 

where the expectation is taken with respect to the initial belief 
as well as the distribution of observation sequence. 



Note that in the above problem, the cost of a test is stated 
in terms of minimizing the expected sample size plus the 
expected penalty of wrong declaration. We are interested in the 
characterization of this cost as a function of penalty L. It is easy 
to show that under the optimal selection rule, the probability of 
error approaches zero as L approaches infinity. Furthermore, as 
shown in |20|, the above problem is (asymptotically) equivalent 
to the problem of minimizing the (expected) number of samples 
subject to a constraint e = (LlogL)^^ on the expected 
probability of error 

B. Types of Policies 

A policy is a rule based on which stopping time r and 
sensing actions A{t), f = 0, 1, . . . , r — 1 are selected. We as- 
sume that sensing actions are selected according to randomized 
decision A G A(^) whose element indicates the probability 
of selecting sensing action a and in general might change with 
time or not. The sensing actions and the stopping time can 
be selected either based on the past observation outcomes or 
independent from them. According to this fact, policies are 
divided into four categories based on the following two factors: 
i) sequential vs. non-sequential; ii) adaptive vs. non-adaptive. 
Non-sequential policies collect a fixed number of observation 
samples and make the final decision afterwards; while under 
sequential policies, the sample size is not known initially and 
is determined by the observation outcomes. More precisely, 
under non-sequential policies, t ^ N for some G N; while 
for sequential policies, r is a random stopping time. Under 
adaptive policies, the decision maker relies on the previous 
collected samples to select the next sensing action; while under 
non-adaptive policies, the actions are selected independent of 
the past observation outcomes. 

C. Information State as Sufficient Statistic 

The problem of active M-ary hypothesis testing is a partially 
observable Markov decision problem (POMDP) where the state 
is static and observations are noisy. It is known that any 
POMDP is equivalent to an MDP with a compact yet uncount- 
able state space, for which the belief of the decision maker 
about the underlying state becomes an information state ETIl . 
In our setup, thus, the information state at time t is nothing 
but a belief vector specified by the conditional probability 
of hypotheses Hi, H2, ■ ■ ■ , Hm to be true given the initial 
belief and all the previous observations and actions. Let p{t) 
denote the posterior belief after t observations. Accordingly, 
the information state space is defined as P(e) = {p G [0, 1]^^ : 
T^^Li Pi = 1} where O is the cr-algebra generated by random 
variable 9. In one sensing step, the evolution of the belief 
vector follows Bayes' rule and the expected total cost (|T|| can 
be rewritten as 

E [r] + LPe, (2) 

where Pe = E[l — max^go PjiT)] is the probability of wrong 
declaration and the expectations are taken with respect to 
the distribution of observation sequence as well as the prior 
distribution on 9. 
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Let Vnn{p), Vsn{p), Vsa{p), and Vna{p), denote the 
minimum expected total cost Q for prior belief p under non- 
sequential non-adaptive, sequential non-adaptive, sequential 
adaptive, and non-sequential adaptive policies, respectively. 

D. Overview of the Results and Literature Survey 

Active hypothesis testing generalizes the passive (classical) 
hypothesis testing problem where the number of sensing ac- 
tions is limited to one, both in the fixed sample size (non- 
sequential) case lfT4l . ifTSl . 1221 as well as the sequential 
one ifTTI - lfTSl . While the fixed sample size studies have primar- 
ily focused on the asymptotic analysis in form of identifying 
error exponents for various error types lfT4l . ifTsl . 1221 . the 
study of sequential hypothesis testing has come in form of 
identifying the expected optimal sample size to achieve a given 
error probability. 

The generalization to the active testing case was consid- 
ered by Chernoff in UJ in which a decision maker controls 
sensing actions to optimize the expected total cost ([l]) in a 
sequential (variable sample size) setting. In particular, in |T| 
and its extensions ITSl . l20l . l23]| . heuristic sequential adaptive 
randomized policies were proposed and were shown to be 
asymptotically optimal as L oo where the notion of 
asymptotic optimality |1J denotes the relative tightness of the 
performance upper bound associated with the proposed policy 
and the lower bound associated with the optimal policylJ 

The general active binary hypothesis testing problem was 
recently studied in lT6ll . ITTl where full characterization of 
the error exponent corresponding to the class of adaptive 
and non-adaptive policies was provided. In particular, the 
error exponent corresponding to these two classes was shown 
to be equal, hence establishing zero adaptivity gain among 
non-sequential policies. The generalization to M > 2 was 
considered in ITSl . Note that while ifTSl fully characterizes the 
error exponent corresponding to non-sequential non-adaptive 
policies; it provides only a partial characterization of (i.e., loose 
upper and lower bounds on) the error exponent corresponding 
to non-sequential adaptive policies. 

Table H] provides a visual summary of the literature on hy- 
pothesis testing, excluding the authors' prior work, as discussed 
above. 

We close our literature survey with an overview of the main 
contributions of this paper, which expands our previous works 
l20l . II23I - I25I and unifies various aspects of the prior work: 
• We provide asymptotically tight lower and upper bounds 
on Vmn{p), Vsn{p), and Vsa{p) which hold uniformly 
for aU prior p G P(e). 
- The asymptotic tight bounds on Vmn{p) relies on 
the analysis of lfT4l . ifTSi and the realization that in 
order to minimize the total cost, we have to decrease 

'in [T], the objective was to minimize cE[r] + Pe and tlie proposed policy 
was sliown to be asymptotically optimal as c — > 0. It is straightforward to 
show that for L = ^ , this problem coincides with the active hypothesis testing 
problem defined in this paper. However, we have chosen E[t] + LPe as an 
objective function here because of its Lagrangian relaxation interpretation of 
an information acquisition problem in which the objective is to minimize E[r] 
subject to Pe < e where e > denotes the desired probability of eiTor 



TABLE I 
Hypothesis Testing Literature 



Type 


M = 2 


M >2 


Sequential Passive (K = 1) 




(HI, (SI 


Sequential Non-adaptive 






Sequential Adaptive 


(D, El 


(D. (H 


Non-sequential Passive (K = 1) 


(Ml 


da 


Non-sequential Non-adaptive 


oa, Gil 


da 


Non-sequential Adaptive 




da 



the error probabilities of various types with the same 
exponent among the worst pair of hypotheses. Since 
unlike the passive case studied in lT4l . ITSl . the non- 
adaptive policies produce non-iid observation sam- 
ples, the final step is to characterize the relationship 
between the error exponent of a fixed block length 
and one-step error exponent. 

- The asymptotic tight bounds on Vsn{p) extend the 
results obtained by lT3l to the Bayesian context 
while allowing for randomized non-adaptive policies. 
More specifically, the result of fT3l is obtained via 
the law of large numbers and only holds if the 
observations are i.i.d. Since observations are not 
identical (although they are independent), different 
proof technique is required (note that unlike the non- 
sequential case of extending the work of 1T4| . ITSl . 
the random nature of sample size in the sequential 
case does not allow for a predetermined relationship 
between the error exponent of a fixed block and one- 
step error exponent). 

- The asymptotic tight bounds on Vsa{p) extend those 
obtained by Chernoff 11] to the Bayesian context 
while relaxing the assumption on uniform discrim- 
ination of hypotheses or the need for the infinitely 
often reliance on randomized action deployed in ifTsl 
to ensure sufficient discrimination among hypotheses. 

> In addition, we partially characterize a lower bound for 
Vna{p)- This is, in the Bayesian context, similar to the 
partial characterization of error exponent of ITSl . 

• As corollaries to the above performance bounds, we 
characterize the sequentiality gain and adaptivity gain in 
terms of L. In particular, it is shown that the sequentiality 
gain grows logarithmically as the penalty L increases. 
We also state a simple necessary and sufficient condition 
ensuring a logarithmic adaptivity gain in L for the active 
binary hypothesis testing case. 

« Furthermore, primarily as a sanity check. Section IIV-BI 
contains the maximum achievable error exponents E^n, 
EsN, and Esa in the Bayesian context. In particular, 
our result regarding E^n coincides with that of lT6l - 
ITSl : while the result regarding Esa coincides with that 
of HIj ITSll in the Bayesian context. To the best of our 
knowledge, the result on Esn is new and has not been 
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established before; while our upper bound on En a is 
subsumed by the analysis in ifTSl . 

III. Analytic Results 
In this section, we provide the main results of the paper 
regarding the asymptotic characterization (in L) of Vnn{p), 
Vsn{p), Vsa{p), and Vna{p)- 

A. Assumptions and Basic Definitions 

Throughout the paper, we have the following technical 
Assumptions. 

Assumption 1. For any two hypotheses i and j, i ^ j, there 
exists an action a, a €z A, such that D{q°'\\qj) > 0. 

Assumption 2. There exists ^ < oo such that 



max max sup 



Assumption [T] ensures the possibility of discrimination be- 
tween any two hypotheses. Assumption |2] implies that no two 
hypotheses are fully distinguishable using a single observation 
sample. 

To continue with our analysis, we need the following defi- 
nitions and notations. 

Definition. For all i € fJ, A £ A(^), the optimized discrimi- 
nation of hypothesis i under randomized rule A is defined as 

D*{i, A) min max (1 - a) Aa-Da(g°||g?). 



ae[0,l 



a^A 



Definition. For all i G il, A G A(^), the reliability function 
of hypothesis i with regard to randomized rule A is defined as 

R{i,\) ■.^uAny^\aD[q^\\q<;), 



af^A 



and the maximal randomized rule for hypothesis i is denoted 
by 

A* := argmaxi?(i, A). 

AeA(.A) 

For A G A(^), let -R(A) denote the harmonic mean of 
A)}ign, i.e., 

- M 
and let R* denote the harmonic mean of A*)}igo, i.e., 



fl(i,A*) 

These notions of discrimination and reliability, as we will 
see, are natural (and Bayesian) extensions of reliability in clas- 
sical detection 1221 where reliability function for hypothesis i 
is related to type i error probability. The following fact enables 
a concrete relationship between these notions. 

Fact 1 (Theorem 1 in 126]). For two probability density 
functions q{-) and q'(-) with the same support and for all 
a G [0, 1] we have 

(1 - a)D^{q\\q') < min {(1 - a)Diq\\q'),aDiq'\\q)} . 



B. Main Theorems 

In this subsection, we provide upper and lower bounds on 
the minimum expected total cost ([TJ under different types of 
policies defined in Subsection III-BI These bounds will be used 
then in Section |IV] to characterize the gains of sequential and 
adaptive selection of actions. 

Theorem 1 (Non-sequential non-adaptive policy). Under As- 
sumptions Ul and 12] 

log L — min log ^ 
Vnn{p) < '-f^ + o(log L), (3) 

log L — max log — 
Vnn{p) > - o(logL), (4) 



where 



D 



D .= max min£)*(i,A). 



(5) 



Proof: The detailed proof is provided in AppendixlAl Here 
we provide an overview. 

The proof of the lower bound relies on a generalization of 
Theorem 10 in lfT4l . while the upper bound is achieved via 
a randomized, non-sequential, and non-adaptive policy which 
collects h = f logL+log(i\/-l)- min log ^ -\- oClog L)) / D 

samples (deterministically) and selects sensing actions accord- 
ing to the randomization rule A G A(^) that achieves the 
maximum in (|5]). ■ 

Tlieorem 2 (Sequential non-adaptive policy). Under Assump- 
tions [7] and 12] 



Vsn{p) < min 
xeA(A) 



M 



log L — min log — 



Rii,X) 



+ o(logL), (6) 



M log L — max log — 

Vsn[p) > min Vp, — — i- o(logL). (7) 

\eK{A)^^ R(i,X) 
1—1 

Proof: The detailed proof is provided in AppendixiBl Here 
we provide an overview. 

Suppose A G A(^) achieves the minimum in (|6]l. The upper 
bound ^ is achieved by a policy that selects sensing actions 
according to A and stops sampling at 

T min{7T, : maxpi(??) > 1 — L^^}- 

From upper bound Q we know that the total cost under 
the optimal policy is O(logL). This implies that the error 
probability Pe of the optimal policy is 0{^^^j^). Hence, without 
loss of generality in our proof of the lower bound, we can 
restrict the set of sequential and non-adaptive policies to those 
whose average probability of making an error is 0{^^^j^). 
Conditioning on the true hypothesis and considering the dy- 
namic of pairwise likelihoods, we then compute the minimum 
expected number of samples necessary to achieve this target 
error probability. ■ 
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Theorem 3 (Sequential adaptive policy). Under Assumptions\I\ 
and\2\ 

M log L — min log — 

VsAip) < +0i\0gL), (8) 



i=l 



M log L — max log — 

ysAiP)>Y^^P. " -o(iogL). (9) 

Proof: The detailed proof is provided in Appendixicl Here 
we provide an overview. 

The proof of the lower bound relies on a generalization 
of Theorem 2 in 0]. The upper bound is achieved via tti, 
a heuristic two-phase policy introduced in ||20| which in its 
first phase, selects actions in a way that all pairs of hypotheses 
can be distinguished from each other; while its second phase 
coincides with Chernoff's scheme HI where only the pairs 
including the most likely hypothesis are considered. In [[20l|, the 
second phase of tti is shown to ensure its asymptotic optimality 
in L; while its first phase in a very natural manner relaxes the 
technical assumption in JTI where all actions are assumed to 
discriminate between all hypotheses pairs or the need for the 
infinitely often reliance on randomized action deployed in IfTSl 
in order to ensure sufficient discrimination among hypotheses. 

■ 

We close this section by a note on the class of non-sequential 
adaptive policies even though they seem rather unnatural to us 
(It is more reasonable to control the sample size using the 
observation outcomes if they are already being used to select 
sensing actions). Next proposition provides a lower bound on 
the minimum expected total cost under non-sequential adaptive 
pohcies, denoted by Vna- 

Proposition 1 (Non-sequential adaptive policy). Under As- 
sumptions |7] and H] 

log L — max log — 

Vna{p) > — - o{logL). (10) 

mm max Hn, A) 

iefi xeA{A) 

Next we state and discuss the consequences of the bounds 
proposed above. In Subsection IIV-AI we focus on the advan- 
tages of causally selecting the retire/declaration time as well as 
the adaptive selecting of sensing actions. In Subsection IIV-BI 
we derive the error exponent corresponding to different types 
of policies. 

IV. Consequences of the Bounds 

In this section, we first specialize and simplify the results 
provided in Section |III] for uniform prior. In particular, assume 
that the hypotheses, initially, are equally likely, i.e., pi{0) = jj 
for all i G ft. Let E[r^^], E[r^^], and Eir^^], denote the 
minimum expected number of samples under non-sequential 
non-adaptive, sequential non-adaptive, and sequential adaptive 
policies; while Pe^vAr, P^sn, and Pes a represent average 
probability of making a wrong declaration. 

From Fact[Tl we know that 



Z) < 0.5 max mmuim^S^ XaD(qf\\n^). 

XeA{A) ien j^i ^ m j; 



(11) 



Theorem [T| together with ( fTTI ) implies that: 

Corollary 1 (Non-sequential non-adaptive policy). Under As- 
sumptions Q] fl«(i |2] 



'NN\ 



LPeNN = ^±o(logL) 



D 



> 



21ogi 



max mini?(i,A) 

AGA(yl) i&n 



o(logL). (12) 



Corollary 2 (Sequential non-adaptive policy). Under Assump- 
tions |7] and 12] 



SN 



\ogL 



max i?(A) 

xeA{A) 



±o(logL). (13) 



Corollary 3 (Sequential adaptive policy). Under Assump- 
tions Ul and 12] 



nr*sA] + LPesA = ^ ± o(logi). 



(14) 



Remark 1. Note that the simple two phase structure of the 
policy which achieves the upper bound in (O implies that the 
adaptivity gain can be obtained via coarse level adaptation. 

From the results above, it is evident that the minimum 
expected total cost under all classes of policies grows loga- 
rithmically in L. However, the coefficient of the log L term is 
not the same in general and we have 

^* > max ^(A) > max min A) > I). (15) 

AeA(^) xeK{A) ien 

A. Sequentiality and Adaptivity Gains 

In this subsection, we discuss the advantage of causally 
selecting the retire/declaration time, i.e., t as well as the 
sensing actions. Let Vmn, Vsn, and Vsa, respectively, de- 
note the minimum expected total cost under non-sequential 
non-adaptive, sequential non-adaptive, and sequential adaptive 
policies under uniform prior, i.e., := ^x{[jji jji ■ ■ ■ , jj]) 
where x denotes the class of policies NN, SN, and SA. 

First, we show that the performance gap between the sequen- 
tial and non-sequential policy, Vnn — Vsn, grows logarithmi- 
cally as the penalty L increases. We refer to this performance 
gap as the sequentiality gain. 

Corollary 4. Under Assumptions Q] and |2] the sequentiality 
gain is characterized as 



Vnn — Vsn 



> logL 



max mini?(i,A) max RiX) 

,AeA(^) iGO AeA(^) 



o(logL). 



Remark 2. The sequentiality gain grows logarithmically 
with L and from ( fTsT i, 



Vnn — Vsn > 



\ogL 



max i?(A) 

AeA(^) 



- o(log L) 
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Next, the advantage of adaptively selecting the sensing 
actions is discussed. In particular, it is shown that the per- 
formance gap between the adaptive and non-adaptive policy, 
VsN — VsA, grows logarithmically as the penalty L increases. 
We refer to this performance gap as the adaptivity gain. 

Corollary 5. Under Assumptions\l\and^ the adaptivity gain 
is characterized as 



VsN - VsA = log L 



1 



1 



max R(X) R 



- ±o(logL). 



Remark 3. Unless there exists a A e A(^) such that, 

R{i, A) = R{i, A*) for all i e n, 

the adaptivity gain grows logarithmically with L. 

A sufficient condition under which there is no adaptivity 
gain is that of stochastic dominance/degradation |27J, i.e., if 
there exists a stochastic transformation W from Z to Z ancQ a 
sensing action a* such that for all other sensing actions a & A, 



qtiz) = J qriy)Wiy;z)dy, e n. (16) 

As shown by Sakaguchi ||28l . (fT6] l implies that 

D{qt\\q^) < D{qf\\qf), Va G A, Vz,j £ ft, 

hence, ensuring zero adaptivity gain when observations ob- 
tained by all actions are stochastically degraded version of the 
observation under sensing action a*. This formalizes the notion 
of informativeness and confirms the conjecture provided in 15]. 

B. Reliability and Error Exponent 

Let E'^ [t] denote the expected stopping time (or equivalently 
the expected number of collected samples) under policy tt. 
Policy TT is said to achieve error exponent _E > if 

-1 



lim — logPe''(t,A/) = E, 

t— >oo t 



(17) 



where Pe^(t, M) is the smallest probability of error that policy 
TT can guarantee when looking for the true hypothesis among 
M hypotheses with W^[t\ < t (Note that for non-sequential 
policies, T is deterministic). 

Next we use the bounds obtained in Section to char- 
acterize the maximum achievable error exponent for different 
type of policies. Let Ep^pf, Esn, Esa, and Ej^a denote the 
maximum achievable error exponent under non-sequential non- 
adaptive, sequential non-adaptive, sequential adaptive, and non- 
sequential adaptive policies. 

Corollary 6. Under Assumptions\l\and^ we have 



Enn 

EsN 



E. 



SA 



D 

max ^(A), 

AgA(^) 

R*. 



^Function W : Z X Z is called a stochastic transformation from Z 

to Z if it satisfies W(y; z)dz = 1. 



Remark 4. The above characterizations of maximum achiev- 
able error exponent are nothing but the Bayesian and i\/-ary 
version of the results in the literature (see Table In fact 
as discussed in Subsection III-DI these results provide a sanity 
check viz a viz the prior work: Enn coincides with that of 
QH-IHl; while Esa coincides with that of QSl. To the 
best of our knowledge, the result on Esn is new and has not 
been established before. 

Remark 5. The above corollary provides alternative means 
to underline and characterize the sequentiality and adaptivity 
gains. In particular, sequentiality always results in an im- 
provement in the maximum achievable error exponent since 
Enn < 0.5 max mini?(i, A) < Esn- In contrast, adaptive 

selection of actions results in an improvement in the maximum 
achievable error exponent only if max -R(A) ^ R*. 

XeA{A) 

We can also find an upper bound on the maximum achievable 
error exponent of any non-sequential yet adaptive policy (tight 
lower bounds are necessary for full characterization, however). 

Corollary 7. Under Assumptions\l]and\2\ we have 

-E/vA < min max i?(i,A). 
len xeA{A) 

Remark 6. Our upper bound on En a is subsumed by ifTSl 
Theorem 3]. 



V. Special Case; Binary Hypothesis Testing 

In this section, we consider active binary hypothesis testing 
(M = 2) as a special case. 



A. Analytical Results 

The performance bounds provided in Section |III| are simpli- 
fied by substituting the following equations into the denomi- 
nators of the bounds. 

i?(l,A) = ^Aai?(g?||92"), R{2,X) = J2^aDiq^M), 



a£A 



aeA 



R{1, \l) = max 1192"), i?(2, A^) ^ nmxD{q^\\q1), 

aeA aeA 



RiX) 



R* 



0.5 



0.5 



E ^aD{q-,M) E ^aD{q!^\\q1) 
^a£A aeA ' 



0.5 



0.5 



maxZ3((7°||'?2) maxZ?((72 Iki) 



Next we state a simple necessary and sufficient condition for 
a logarithmic adaptivity gain in the active binary hypothesis 
testing case. 

Corollary 8. In the active binary hypothesis testing case, the 
adaptivity gain grows logarithmically in L if and only if 

aTguiayiD{ql\\q-i) ^ &vgTaa.y.D{q-i\\ql). 

aeA aeA 
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The problem of passive binary hypothesis testing (K = 1, 
M = 2) with fixed-length (non-sequential) as well as variable- 
length (sequential) sample size has been studied by lfT4l . 
ifTSl . II22I . II29I . Our sequentiality gain, in this case, is the 
manifestation of the fact that "sequential tests are superior in 
ensuring that both error probabilities decreasing at the best 
possible exponential rates" 1291 . 

Recently, the authors in ifTSl and ifTTl have studied the 
problem of active binary hypothesis testing for fixed-length 
and variable-length sample size respectively. Our work com- 
plements the findings in fVT] by providing an asymptotic 
optimal solution in a total cost (and Bayesian) sense as well as 
establishing a non-zero sequentiality and potentially non-zero 
adaptivity gain. In |fT6l , the error exponent corresponding to the 
class of NN and NA policies were fully characterized for the 
problem of active binary hypothesis testing with fixed sample 
size. In the Bayesian context, the result of ifTSl regarding the 
error exponent of the class of NN policies coincides with 
our Corollary |6] while the full characterization of the error 
exponent corresponding to the class of NA policies in lfT6l . 
strengthens Corollary [7] in the binary case. In particular, it is 
shown that in the binary hypothesis testing setup E^isi = 
En A, hence, establishing zero adaptivity gain among non- 
sequential policies. For the special case of channel coding with 
feedbacli with two messages, the above result, i.e., the zero 
adaptivity gain among non-sequential policies, was established 
in El, m. 



B. Numerical Example 

Consider the active binary hypothesis testing problem with 
additive Gaussian noisy observations under two actions a and 
b shown in Fig. [T] In this example, the observation noise 
associated with actions a and b are such that they add unequal 
noise to the hypotheses. In the remainder of this subsection, 
we compare the performance of all considered policies for this 
example. 



Sensing action a 



N(0,l) 



Sensing action h M0,4) 

. 1 — >e- 



m,2) 



Fig. 1 . Active binary hypothesis testing problem with additive Gaussian noisy 
observations. 



The problem of channel coding with feedback can be interpreted as a 
special case of active hypothesis testing (See 1301 for more details). 



Table nil compares the performance bounds of the considered 
policies for the example of Fig[T] 

TABLE II 

Comparison of performance bounds for the example of Fig[T] 





Sequential 


Non-sequential 


Adaptive 


logL/2.98 


< log L/ 1.89 


Non-adaptive 


logL/2.27 


21ogL/1.78 



VI. Discussion and Future Work 

In this paper, we considered the problem of active hypothesis 
testing and we analyzed the gain of sequential and adaptive 
selection of actions. 

Our analysis assumes two technical conditions. However, it 
seems to us that Assumption |2] is for ease of our proofs. As 
part of our future work, we believe that standard techniques 
as in II33I . Il34l can be applied to generalize the bounds when 
Assumption 12] does not hold. We also note the results obtained 
in im and m, 1231 have been shown in 153 and l36ll . 
respectively, to extend to higher moment characterization of 
the optimal (sequential) sample size. Similar extension in the 
context of sequential and non-adaptive policies seem to follow 
naturally and is important area of future investigation. 

In our analysis in this paper, we only investigated asymptotic 
performance in L and the complementary role of asymptotic 
analysis in M was neglected. In particular, we have only 
identified the zero-rate characterization of error exponent; 
while for a full characterization in which error exponent is 
traded off with information acquisition rate, we would need 
an asymptotic characterization of the problem both in L and 
AL Although we have partially addressed this problem in II20II 
for the class of sequential policies, the full characterization of 
the performance bounds in L and M for all types of policies 
defined in this paper remains an important area of future work. 

Appendix 

A. Theorem [7] non-sequential non-adaptive policy 
In this subsection, we show that 
logL 



Vnn{p) < 



min loff — 



logi 



Vnn{p) > 



D 

max loK — 



D 



o(logL), 
o(logL), 



(18) 



(19) 



where 

D = 



max mm mm max > 

AeA(^) ien a6[0,ll 



Xa{l~a)D^{q^\\q';). (20) 



Suppose A G A(^) achieves the maximum in ( |20] i. Let 
ttnn be a non-sequential non-adaptive policy that collects n 
observation samples and selects sensing actions according to 
the randomized rule A. The expected total cost under this 
policy is n + LPe. Next we find an upper bound for Pe. Let 



g 



2i(n) = {Z" : pi{n) > pj{n) for all ; G f2} and eij{n) = 
P{{Z-:pM)<p,{n)}\e^i). 



M 



Pe = J2p^Pi^lMZ" ■■ Mn) < Pj{n)}\e 



M 

< (M — 1) max eij{h). 



(21) 



From (ISTT l and Lemma [T] in Appendix IE] we obtain 

Pe < (A/ - l)x 

exp (-h{l-a)Y, AaZ?a(<Z- I - miii log ^ + o(n)') . 



a<£A 

We can select n as 

n = I log L + logfAf - 1) - min log — + o(log L) ] ID 

(22) 

such that Pe = 0(-^), and hence, 

log L — min log — 

Vnn <h + LPe<fi + l = — + o(logL). 

D 

This completes the proof of upper bound. Next the proof of 
lower bound is given. 

Consider a policy ttnn that collects n observation samples 
according to A £ A(^). We have 

M 

Pe = ^p,P(Uj^,{Z" : p,{n) < p,{n)}\e = ^) 



> Pi^-ij + Pj^ji for any i,j G CI. 



(23) 



From ( [23T l and Lemma [T] in Appendix |E] a lower bound is 
obtained for the expected total cost under policy ttjvat. The 
lower bound for Vnn is obtained by minimizing over the 
choices of n and A. 

B. Theorem |2] sequential non- adaptive policy 
In this subsection, we show that 

M log L — min log — 

Vsn{p) < ^ min 2^ Pt ^7— rr + o(log L), 

AeA(^) ^ R{i, A) 

(24) 

A/ log L — max log 

v-^ '''' 

l^SAr(p) > ^ min — — o(log L). 

(25) 

In contrast to the passive case, the observations in the 
active case (either adaptive or non-adaptive) are not necessarily 
identical over time. Therefore the analysis of lfT3l for sequential 
passive hypothesis testing (which is based on the law of large 
number and results for random walks) is not applicable to the 
problem of sequential non-adaptive hypothesis testing. 



Suppose A G A(^) achieves the minimum in (l24l l. The 
upper bound (l24b is achieved by a policy that selects sensing 
actions according to A and stops sampling at 

T := min{n : max. pi{n) > 1 — L^^}. 

Let Ti, i G ri, be Markov stopping times defined as follows: 

Pi(n) 

T," := mm i n : mm — > — — , , , . , — )> . (26) 



pjin) - L-^/{M-l) 
Note that by definition 

{M-i)p,{n)>Y,Pj{n ^"^ ' 



L-V(M-l) 



= {M-l){l-p,{n)) . 

This implies that pi{Ti) > 1 — L^^ and hence, t < ti for 
all I G fi. From total cost under the above policy can be 
written as 

y(p) = E[T]+L[l-maxp,(r)] 
< E[t] + 1 

M 

^J2pM[t\0^i] + 1 



M 



<^p,E[T,|0 = i] + l, 



(27) 



i=l 



where p = [pi, p2, . . . , pm] = [pi{Q), P2{0), ■ ■ ■ Pm{Q)] and 
the last inequality follows from the fact that r < r^, Vi G il. 

Next we find an upper bound for E[ri|6' = i], i G fi. Before 
we proceed, we introduce the following notation to facilitate 
the proof: 

L V(A/ - 1) k^t Pk 

Let i := (logL)~3. We have 

00 

(a) J". 

< ^ +o(logL) 



log L — min log — 



(28) 



where inequality (a) follows from the fact that i = (logL)~3 
and by Lemma |2] in Appendix |E1 Now from ( l27l ) and ( |28] |, we 
have the assertion of the theorem. 

Next we provide the proof of lower bound ( |25T l which 
follows closely the proof of Theorem 2 in Hj . 

From upper bound ( |24] | we know that the total cost under 
the optimal policy is O(logi). This implies that the Pe of the 
optimal policy is 0{^^^j^). Hence, without loss of generality 
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in our computation of the lower bound, we can restrict the set 
of poUcies to those whose average probabiUty of making an 
error is O(^). 

Let TTsN denote a sequential policy that selects sensing 
actions according to A G A(^) and stops sampling whenever 
Pe < e. For all i G Q, let 



log - — max log — 
R{i,\)+5 



(1 - 5) 
Under policy ttsn, 

p{{t < T,} \e = i) 

^p({r<T,}nf]\ 



(29) 



(30) 



where (a) follows from Lemma|4]in Appendix |E] and the union 
bound; and (6) follows from Lemma [3] in Appendix |E] 

The expected total cost under policy -ksn is lower bounded 

as 



M 



E[r] +iPe > ^p,E[r|6l = i] 

i=l 
AI 

1=1 
M 

>Y,p,T,p{T>T,\e = i) 



i=l 
M 



i=l 



mm^eo Pj 



For S = (log - ) 4, the lower bound simplifies to 



M log - — max log 

j\f log L — max log ^ 
> 51^* ^^^^^^^^ o(logL), 

i=l 



where the last inequality follows from the fact that for an 
optimal policy, e = 0{^-^j^). The lower bound for Vsn is 
obtained by minimizing over the choice of A. 



C. Theorem \3\ sequential adaptive policy 
We have 



M log L — min log ^ 

Vsa{p)<J2p^ ^7^^V^+«(l°g^)' (31) 

M log L — max log ^ 

VsA (P) > J2 - o(log ^) • (32) 

1=1 



R{hK) 



The upper bound was proved in 11201 Prop. 3]. The proof of 
the lower bound relies on a generalization of Theorem 2 in HI 
and is provided next. 

From upper bound dSTT i we know that the total cost under 
the optimal policy is O(logL). This implies that the error 
probability Pe of the optimal policy is 0(i^^|^). 

Let TisA denote a sequential policy that stops sampling 
whenever Pe < e. For all i G f2, let 



T* := (1 - 5) 

Under policy ttsa, 
P{{t < T*} \e = ^) 



log - — max log — 



(33) 



P 



((--•>"n{^^(Vii-.) 

pi{r<T:}n[ji 



7^* (52 



PAr) 



T*5'^ ' '^'^ ^'^ \pi ' miuj^ipj " ^^^^ 

where (a) follows from Lemma|5]in Appendix |E] and the union 
bound; and (6) follows from Lemma [3] in Appendix |El 

The expected total cost under policy tvsa is lower bounded 

as 

A/ 



E[t]+ LPe>J2pM'r\S ^i] 

1=1 

M 

= '^Pi^[T'i-{T>T*} +Tl{r<T-}|6' = i] 

i=\ 
M 

>Y^p{T:p{T>Ti\d^i) 



M 



mmjgfj Pj 



For (5 = (log i) 4, the lower bound simplifies to 



Ai log i - max log 
KM+LPe>E,.^^^-^^^ ''(l^g;) 

M log L — max log 
> 2^ Pi ^TJ-Tm o(logL), 
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where the last inequaHty follows from the fact that for an 
optimal policy, e = 0(i^^|-^). 

Remark 7. The result above is in agreement with Theorem 2 
in Q and Theorem 4 in ifTSll. 



D. Proposition |7] non- sequential adaptive policy 

In this subsection, we show that 

log L — max log — 

Vna{,p) > —. ^7— TT " 

mm max Hn, A) 

Proof: 

Let ttna be a non-sequential adaptive policy that collects n 
observation samples. Consider an arbitrary S > and let 

1 



exp ( n{R{i, A*) + 5) + max log ] + 1 



We have 



M 



Pe > Y.P'^^'^ " 1^ = i]P{Zr\e - i), (35) 



where 



E[l - p,{n)\9 = i]> e,P(l - p,{n) > e,\e = i). (36) 

Let j = argmiii^" „ E[log ^^^j \9 = i] where actions 
{A{t)}'^^Q are selected according to ttna- 

P{1 - p^{n) < €,\9 = i) 
= pflog-^^>logl^|0 = 



1 - pi{n) 

'<'pf/iog£44-E[i„g£44i 



1 - e 



> log — — maxlog nR(i, A*) > 16* = j 

(&) 



< exp(-n(57(logO'), 
where (a) follows from the fact that given {9 = i}. 



(37) 



E[log ^] = log + ^ E[log . 



Pi 



< maxlog ^ + jimin ^ \lDiq1\\q]), (38) 



Similarly, it can be shown that 

P{Z^\9 = z) < exp(-7i(i?(*,A,*))V(logO')- (39) 

Combining ([35]l-(l39]l and minimizing the bound over n, we 
have the assertion of the proposition. H 

E. Technical Background 

In this appendix, we provide some preliminary facts and 
lemmas which are technical and only helpful in proving the 
main results of the paper. 

Fact 2 (Kolmogorov's Maximal Inequality ll37l ). Suppose 
Xt for t = 1,2,..., be independent random variables with 
E[Xt] = and Var{Xt) < oo. Let Sn = ELi ^t- Then 



Var{SN) Eti VariXt) 



P I max \Sn\ > x] < 

,0<n<N 



Fact 3 (McDiarmid's Inequality (Ml)- Let X = {Xi, . . . , X„) 

be a family of independent random variables with Xk taking 
values in a set Xk for each k. Suppose a real-valued function 
f defined on Il^^^Xk satisfies |/(x) — /(x')| < Ck, whenever 
the vectors x and x' only differ in the k-th coordinate. Then 
for any v > 0, 

P(/(X) -E[/(X)] >!.)< e-2-VE^.,c^., 
P(/(X) -E[/(X)] < -ly) < e-2'''/^^=i='. 

Lemma 1. Consider a policy that collects observation samples 
according to a randomized rule A. Under this policy and for 
all i,j € ^l, and a € [0, 1], 

max {eij{n),eji{n)} < exp f - n(l - a) ^ XaDa{qt\\qj) 

^ aeA 

- mill { log log ^}+o(n)Y 

Pj Pi J 

max{ey(n),eji(n)} > exp f - n(l - a) ^ XaDa{qi\\qj) 

^ aeA 

— max { log — , log — I — o(n) 

P] Pi 

The proof of Lemma [T] follows closely the proof of Theo- 
rem 9 in fT4l . 



Lemma 2. Given any t > and for n > — — ^(1 + t), we 

R,{i,X) ^ ' 

have P{{t, > n}\9 = i) < [M - l)e-''(')" where 



2t2 (R{i,\) 



(1 + 02 V21oge 



and (6) follows from Fact [3] 



Proof of Lemma |2} 
Let Bij{n) be an event in the probability space defined as 
follows; 

^log^ <log-— ^ 



p,{n) "L-V(A/-1). 
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By construction 



policy TT. We have 



p{{n > n}\e = i)< PiUj^,B,j{n)\e = i) 

<^P(By(r^)|0 = ^). (40) 



Furthermore, we have 



= p({log 



/'^N E[log^l< 



P loe 



'({ 
log — 

<p({log 



log 

Pijn) _jj 

L-V(A/-1) 



Pj(n) 

L-V(M- 1) 
-E[log^l< 



E[log^ + Elog%(I)]}l^ 



log 



' log — ^ < 

min log n 



A{t) 

/ 

9j 



= p 



{log 



L-V(M-l) 
/'»N_E[logM4l<r,:-n 



Pj(n) 



R{i,\)]\ 
i?(i,A)}| 



(41) 



For any a,a ^ A and i, j G f2, we have 



log 



21og^. For fc = l,2,...,n, let Xu = log % 



% - log % 



< 



and 



, X„]. Define function /(X) = log ^ 



ELi-^fc = From gil, (ETJ, and Fact [51 and for 



n > 



i?,(i,A) 



(1 + t), we have 



P{{n>n}\e^i) 

< {M - 1) cxp I -2n 



< (M - 1) exp -n 



P(z,A) 
21oge 



1 Ti 

1 ^ 

nR{i,X) 



( R{i,\) 



(i + O^VaiogC 



Lemma 3. Consider a sequential policy tt that selects the 
stopping time t such that Pe < e. For any i, j G Q, we have 



Pli^,<{-rA\0-i]<e^(- + - 
.PjiV e J / VP* Pj 



Proof: The proof follows closely the proof of Lemma 4 
in HI. Let 9 = d{A'^,Z'^) denote the final declaration under 



P 



n 



{0 = t}\d 



< 



P 

•1m 



P {{O = ^}\0 = j) +P{{0^ 1^? = 

< {-/-'P {{e + 3] \Q = j)+P {{0 ^^}\0 = ^) 



e Pj Pi 



Pj Pi 
\Pi Pj 



where (a) follows from the fact that under policy tt and for 
all i e fi, 



1 



k=l 



-Pe 



Pi 
e 

< — . 

Pi 



Lemma 4. Consider a sequential policy tt that selects sensing 
actions according to X £ A(^) and selects the stopping time 
T such that Pe < e. We have 



P\{r<T,}nf]\ 



where Ti is as defined in 

The proof of Lemma|4]follows closely the proof of Lemma 5 
in m. 

Lemma 5. Consider a sequential policy tt that selects the 
stopping time r such that Pe < e. We have 



pi{r<Tnnf]\ 



where T* is as defined in ( I33l l. 

Proof: The proof follows closely the proof of Lemma 5 
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in IT]. We have 

< P({ min : log ^ > {I - 5) log ^ Vj ^ z} < 



p 



\37i,0 <n<T; s.t. log-^ > 



Pj{n) 

(l-J)logl VjV*}|^ = ») 



(a) 



Piin)- 



< p(\j{3n,0<n<T* s.t. log^-E[log^J > 



< P 



(1 - 5) log i - max log — - nR{i, X*)]\e ^ i) 
Piin) 



[j |3n,0 <n<T* s.t. log 



Pj{n) 



[log 44] > 



< >^ P I max 

V 0<)i<T' 



w 7;*(iogC)^ 



log - E log — ^ ^ > 

L Piin] pAn) J 



where (a) follows from ( l38] i: (fe) follows from the definition 
of Tj;* and the fact that n <T*; and (c) follows from Fact|2] 
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